WICENTOWSKI AND SYDES, Using Implicit Information to Identify Smoking Status in Smoke-Blind Discharge Summaries Technical Brief _ Using Implicit Information to Identify Smoking Status in Smoke-Blind Medical Discharge Summaries

نویسندگان

  • RICHARD WICENTOWSKI
  • MATTHEW R. SYDES
چکیده

A b s t r a c t As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the “smoke-blind” dataset). The performance of the Naïve Bayes classifier was compared to the performance of three human annotators on a subset of the same training dataset (n=54) and against the evaluation dataset (n=104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Smoking Status From Implicit Information in Medical Discharge Summaries

Human annotators and natural language applications are able to identify smoking status from discharge summaries with high accuracy when explicit evidence regarding their smoking status is present in the summary. We explore the possibility of identifying the smoking status from discharge summaries when these smoking terms have been removed. We present results using a Näıve Bayes classifier on a ...

متن کامل

Technical Brief: Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries

As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classif...

متن کامل

Emotion Detection in Suicide Notes using Maximum Entropy Classification

An ensemble of supervised maximum entropy classifiers can accurately detect and identify sentiments expressed in suicide notes. Using lexical and syntactic features extracted from a training set of externally annotated suicide notes, we trained separate classifiers for each of fifteen pre-specified emotions. This formed part of the 2011 i2b2 NLP Shared Task, Track 2. The precision and recall of...

متن کامل

Gains from diversification on convex combinations: A majorization and stochastic dominance approach

By incorporating both majorization theory and stochastic dominance theory, this paper presents a general theory and a unifying framework for determining the diversification preferences of risk-averse investors and conditions under which they would unanimously judge a particular asset to be superior. In particular, we develop a theory for comparing the preferences of different convex combination...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007